# Install required libraries
packages <- c('leaflet','dplyr','data.table','sp', 'rgeos', 'raster',
'rgdal','GISTools','magrittr','BSDA', 'PASWR','broom','tidyverse','gtools')
for(p in packages){
if(!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Loading required package: leaflet
package 㤼㸱leaflet㤼㸲 was built under R version 3.5.3Loading required package: dplyr
package 㤼㸱dplyr㤼㸲 was built under R version 3.5.3
Attaching package: 㤼㸱dplyr㤼㸲
The following objects are masked from 㤼㸱package:stats㤼㸲:
filter, lag
The following objects are masked from 㤼㸱package:base㤼㸲:
intersect, setdiff, setequal, union
Loading required package: data.table
package 㤼㸱data.table㤼㸲 was built under R version 3.5.3data.table 1.12.8 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
Attaching package: 㤼㸱data.table㤼㸲
The following objects are masked from 㤼㸱package:dplyr㤼㸲:
between, first, last
Loading required package: sp
Loading required package: rgeos
package 㤼㸱rgeos㤼㸲 was built under R version 3.5.3rgeos version: 0.5-2, (SVN revision 621)
GEOS runtime version: 3.6.1-CAPI-1.10.1
Linking to sp version: 1.4-1
Polygon checking: TRUE
Loading required package: raster
package 㤼㸱raster㤼㸲 was built under R version 3.5.3
Attaching package: 㤼㸱raster㤼㸲
The following object is masked from 㤼㸱package:data.table㤼㸲:
shift
The following object is masked from 㤼㸱package:dplyr㤼㸲:
select
Loading required package: rgdal
package 㤼㸱rgdal㤼㸲 was built under R version 3.5.3rgdal: version: 1.4-8, (SVN revision 845)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
Path to GDAL shared files: C:/Users/Lim Wei JIi/Documents/R/win-library/3.5/rgdal/gdal
GDAL binary built with GEOS: TRUE
Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
Path to PROJ.4 shared files: C:/Users/Lim Wei JIi/Documents/R/win-library/3.5/rgdal/proj
Linking to sp version: 1.4-1
Loading required package: GISTools
package 㤼㸱GISTools㤼㸲 was built under R version 3.5.3Loading required package: maptools
package 㤼㸱maptools㤼㸲 was built under R version 3.5.3Checking rgeos availability: TRUE
Loading required package: RColorBrewer
package 㤼㸱RColorBrewer㤼㸲 was built under R version 3.5.2Loading required package: MASS
Attaching package: 㤼㸱MASS㤼㸲
The following objects are masked from 㤼㸱package:raster㤼㸲:
area, select
The following object is masked from 㤼㸱package:dplyr㤼㸲:
select
Loading required package: magrittr
package 㤼㸱magrittr㤼㸲 was built under R version 3.5.3
Attaching package: 㤼㸱magrittr㤼㸲
The following object is masked from 㤼㸱package:raster㤼㸲:
extract
Loading required package: BSDA
package 㤼㸱BSDA㤼㸲 was built under R version 3.5.3Loading required package: lattice
Attaching package: 㤼㸱BSDA㤼㸲
The following object is masked from 㤼㸱package:datasets㤼㸲:
Orange
Loading required package: PASWR
package 㤼㸱PASWR㤼㸲 was built under R version 3.5.3Loading required package: e1071
package 㤼㸱e1071㤼㸲 was built under R version 3.5.3
Attaching package: 㤼㸱e1071㤼㸲
The following object is masked from 㤼㸱package:raster㤼㸲:
interpolate
Attaching package: 㤼㸱PASWR㤼㸲
The following objects are masked from 㤼㸱package:BSDA㤼㸲:
Chips, CIsim, Combinations, Depend, EDA, Engineer, Grades, Kinder, normarea, nsize, ntester, Phone, Rat,
Salinity, SIGN.test, Soccer, SRS, tsum.test, Wool, z.test, zsum.test
Loading required package: broom
package 㤼㸱broom㤼㸲 was built under R version 3.5.3Loading required package: tidyverse
package 㤼㸱tidyverse㤼㸲 was built under R version 3.5.3Error: package or namespace load failed for 㤼㸱tidyverse㤼㸲 in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
there is no package called 㤼㸱xml2㤼㸲
Installing package into 㤼㸱C:/Users/Lim Wei JIi/Documents/R/win-library/3.5㤼㸲
(as 㤼㸱lib㤼㸲 is unspecified)
also installing the dependency 㤼㸱xml2㤼㸲
There is a binary version available but the source version is later:
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/tidyverse_1.3.0.zip'
Content type 'application/zip' length 439602 bytes (429 KB)
downloaded 429 KB
package ‘tidyverse’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Lim Wei JIi\AppData\Local\Temp\RtmpsXv21J\downloaded_packages
installing the source package 㤼㸱xml2㤼㸲
trying URL 'https://cran.rstudio.com/src/contrib/xml2_1.3.1.tar.gz'
Content type 'application/x-gzip' length 271674 bytes (265 KB)
downloaded 265 KB
* installing *source* package 'xml2' ...
** package 'xml2' successfully unpacked and MD5 sums checked
** libs
*** arch - i386
rm -f connection.o init.o xml2_doc.o xml2_init.o xml2_namespace.o xml2_node.o xml2_output.o xml2_schema.o xml2_url.o xml2_xpath.o xml2.dll
"C:/PROGRA~1/R/R-35~1.0/bin/i386/Rscript.exe" "../tools/winlibs.R" 2.9.8
C:/Rtools/mingw_32/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c connection.cpp -o connection.o
C:/Rtools/mingw_32/bin/gcc -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O3 -Wall -std=gnu99 -mtune=generic -c init.c -o init.o
C:/Rtools/mingw_32/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_doc.cpp -o xml2_doc.o
C:/Rtools/mingw_32/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_init.cpp -o xml2_init.o
C:/Rtools/mingw_32/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_namespace.cpp -o xml2_namespace.o
C:/Rtools/mingw_32/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_node.cpp -o xml2_node.o
C:/Rtools/mingw_32/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_output.cpp -o xml2_output.o
C:/Rtools/mingw_32/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_schema.cpp -o xml2_schema.o
C:/Rtools/mingw_32/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_url.cpp -o xml2_url.o
C:/Rtools/mingw_32/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_xpath.cpp -o xml2_xpath.o
C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o xml2.dll tmp.def connection.o init.o xml2_doc.o xml2_init.o xml2_namespace.o xml2_node.o xml2_output.o xml2_schema.o xml2_url.o xml2_xpath.o -L../windows/libxml2-2.9.8/lib/i386 -lxml2 -llzma -liconv -lz -lws2_32 -LC:/PROGRA~1/R/R-35~1.0/bin/i386 -lR
installing to C:/Users/Lim Wei JIi/Documents/R/win-library/3.5/xml2/libs/i386
*** arch - x64
rm -f connection.o init.o xml2_doc.o xml2_init.o xml2_namespace.o xml2_node.o xml2_output.o xml2_schema.o xml2_url.o xml2_xpath.o xml2.dll
"C:/PROGRA~1/R/R-35~1.0/bin/x64/Rscript.exe" "../tools/winlibs.R" 2.9.8
C:/Rtools/mingw_64/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c connection.cpp -o connection.o
C:/Rtools/mingw_64/bin/gcc -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -std=gnu99 -mtune=generic -c init.c -o init.o
C:/Rtools/mingw_64/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_doc.cpp -o xml2_doc.o
C:/Rtools/mingw_64/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_init.cpp -o xml2_init.o
C:/Rtools/mingw_64/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_namespace.cpp -o xml2_namespace.o
C:/Rtools/mingw_64/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_node.cpp -o xml2_node.o
C:/Rtools/mingw_64/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_output.cpp -o xml2_output.o
C:/Rtools/mingw_64/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_schema.cpp -o xml2_schema.o
C:/Rtools/mingw_64/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_url.cpp -o xml2_url.o
C:/Rtools/mingw_64/bin/g++ -I"C:/PROGRA~1/R/R-35~1.0/include" -DNDEBUG -I../inst/include -I../windows/libxml2-2.9.8/include/libxml2 -DLIBXML_STATIC -O2 -Wall -mtune=generic -c xml2_xpath.cpp -o xml2_xpath.o
C:/Rtools/mingw_64/bin/g++ -shared -s -static-libgcc -o xml2.dll tmp.def connection.o init.o xml2_doc.o xml2_init.o xml2_namespace.o xml2_node.o xml2_output.o xml2_schema.o xml2_url.o xml2_xpath.o -L../windows/libxml2-2.9.8/lib/x64 -lxml2 -llzma -liconv -lz -lws2_32 -LC:/PROGRA~1/R/R-35~1.0/bin/x64 -lR
installing to C:/Users/Lim Wei JIi/Documents/R/win-library/3.5/xml2/libs/x64
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
converting help for package 'xml2'
finding HTML links ... done
as_list html
as_xml_document html
download_xml html
oldclass html
read_xml html
url_absolute html
url_escape html
url_parse html
write_xml html
xml2_example html
xml_attr html
xml_cdata html
xml_children html
xml_comment html
xml_dtd html
xml_find_all html
xml_missing html
xml_name html
xml_new_document html
xml_ns html
xml_ns_strip html
xml_path html
xml_replace html
xml_serialize html
Rd warning: C:/Users/Lim Wei JIi/AppData/Local/Temp/RtmpaA7I2o/R.INSTALL1084767614e/xml2/man/xml_serialize.Rd:15: file link 'connection' in package 'base' does not exist and so has been treated as a topic
xml_set_namespace html
xml_structure html
Rd warning: C:/Users/Lim Wei JIi/AppData/Local/Temp/RtmpaA7I2o/R.INSTALL1084767614e/xml2/man/xml_structure.Rd:17: file link 'connection' in package 'base' does not exist and so has been treated as a topic
xml_text html
xml_type html
xml_url html
xml_validate html
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (xml2)
In R CMD INSTALL
The downloaded source packages are in
‘C:\Users\Lim Wei JIi\AppData\Local\Temp\RtmpsXv21J\downloaded_packages’
# Install required libraries
data <- read.csv("data/realis2018.csv")
head(data)
One day, your mum tells you that we just won the first prize of TOTO which worth 2,500,000 SGD. After you discuss with your family, you decide to buy a flat and plan for the investment. During the period of time, you contact the company “Property Master@” to discuss more on the historical transaction of Singapore property. The sample data givenis “realis2018.csv”. For problems stated below, we use α= 5%.
# clean data
# filter no. of units = 1 cos some transactions are the whole damn building
data %<>% filter(No..of.Units==1)
# recode YISHUN and Yishun
data$Planning.Area <- recode(data$Planning.Area,"YISHUN"="Yishun")
You look around few different planning areas (Column R). When you ask the agent what is the mean Unit price(psm) for Newton flats (Column G), she claims that the mean is higher than 26500. Do you agree with your agent’s suggestion? Explain and justify your answer.
Newton <- data %>% filter(Planning.Area=="Newton",Property.Type %in% c("Condominium", "Apartment", "Executive Condominium"), No..of.Units == "1")
head(Newton)
z.test(Newton$Unit.Price....psm, alternative = "greater", mu= 26500,sigma.x=sd(Newton$Unit.Price....psm))
One-sample z-Test
data: Newton$Unit.Price....psm
z = 1.881, p-value = 0.02998
alternative hypothesis: true mean is greater than 26500
95 percent confidence interval:
26598.75 Inf
sample estimates:
mean of x
27286.51
At 95% Confidence level, we have sufficient evidence to say that the mean price per square meter(psm) of flats in the Newton Area is more than $26500.
Your friend told you that Newton planning area may not be the best area to choose. He suggested you to consider other planning areas. This is a very difficult decision since you need to conduct a more comprehensive analysis and you also need to justify whether you still choose Newton or another planning area.
realis <- fread('data/realis2018.csv')
realis$pa <- toupper(realis$`Planning Area`)
centroids <- readOGR("data/MP14_PLNG_AREA_WEB_PL.shp")
OGR data source with driver: ESRI Shapefile
Source: "D:\ASSR\TakeHome\data\MP14_PLNG_AREA_WEB_PL.shp", layer: "MP14_PLNG_AREA_WEB_PL"
with 55 features
It has 12 fields
dgp <- spTransform(centroids, CRS("+proj=longlat +ellps=GRS80"))
one_unit <- subset(realis, realis$`No. of Units` == 1 & realis$`Transacted Price ($)` <= 2500000)
pa_units <- aggregate(realis$`No. of Units`,
by = list(realis$pa),
FUN = sum)
colnames(pa_units) = c('PA', 'Units')
m <- merge(dgp,pa_units, by.x ='PLN_AREA_N', by.y = 'PA')
pal <-
colorBin(palette = brewer.pal(10,"YlGnBu"),
domain = c(0,2000),
na.color = "#00000000",
bins=c(0,5,10,50,100,200,400,600,800,1000,1200,1400,1600,1800,2000))
n too large, allowed maximum for palette YlGnBu is 9
Returning the palette you asked for with that many colors
# create the base map, default will be openstreetmap if not selected
# added centroids point as well
leaflet(dgp) %>% addTiles() %>%
addPolygons(fillColor = ~pal(m$Units),
weight = 2,
opacity = 1,
color = "grey",
dashArray = "1",
fillOpacity = 0.8) %>%
addLegend("topright", pal, values=(0:2000),
title = "Transacted Units",
labFormat = labelFormat(suffix = " Units", between = '-'))
pa_units[order(-pa_units$Units),]
stock_data <- fread("data/stock2019Q4.csv")
stock_data$PA <- toupper(stock_data$PA)
stock <- merge(dgp,stock_data, by.x ='PLN_AREA_N', by.y = 'PA')
pal <-
colorBin(palette = brewer.pal(10,"YlGnBu"),
domain = c(0,2000),
na.color = "#00000000",
bins=c(0,500,1000,3000,5000,8000,10000,15000,20000,30000))
n too large, allowed maximum for palette YlGnBu is 9
Returning the palette you asked for with that many colors
# create the base map, default will be openstreetmap if not selected
# added centroids point as well
leaflet(dgp) %>% addTiles() %>%
addPolygons(fillColor = ~pal(stock$Total),
weight = 2,
opacity = 1,
color = "grey",
dashArray = "1",
fillOpacity = 0.8) %>%
addLegend("topright", pal, values=(0:2000),
title = "Total Stock",
labFormat = labelFormat(suffix = " Units", between = '-'))
stock_data[order(-stock_data$Total),]
PropType <- unique(data$Property.Type)
for (k in 1:length(unique(data$Property.Type))){
data_property <- data %>% filter(Property.Type==unique(data$Property.Type)[k])
res.aov <- aov(Unit.Price....psm.~Planning.Area,data=data_property)
summary(res.aov)
results <- tidy(TukeyHSD(res.aov,ordered=TRUE))
results_sorted <- results %>% separate(comparison, c("Bigger", "Smaller"),sep = "-")
rankings1 <- results_sorted %>% group_by(Bigger) %>% summarise(Count=n()) %>% arrange(desc(Count))
rankings2 <- results_sorted %>% filter(Smaller %!in% rankings1$Bigger) %>% group_by(Smaller) %>% summarise(Count=n())%>% arrange(Count)
names(rankings2)[1] <- "Bigger"
rankings <- rbind(rankings1,rankings2)
rankings$Rank <- seq.int(nrow(rankings))
rankings %<>% dplyr::select(-Count)
results_sorted <- left_join(results_sorted, rankings) %>% arrange(Rank)
rankings$TukeyRank <- NA
rankings$TukeyRank[1] <- 1
for( i in 2:nrow(rankings)){
if(results_sorted$adj.p.value[results_sorted$Bigger==rankings$Bigger[i-1]&results_sorted$Smaller == rankings$Bigger[i]]<=0.05){
rankings$TukeyRank[i] <- rankings$TukeyRank[i-1]+1
} else if(sum((results_sorted$Smaller[results_sorted$Bigger==rankings$Bigger[i-1]&results_sorted$adj.p.value<=0.05] %in% results_sorted$Smaller[results_sorted$Bigger==rankings$Bigger[i]&results_sorted$adj.p.value>0.05])>0)){
rankings$TukeyRank[i] <- rankings$TukeyRank[i-1]+1
} else {
rankings$TukeyRank[i] <- rankings$TukeyRank[i-1]
}
}
Tukey_ranked <- rankings %>% dplyr::select(Bigger,TukeyRank)
names(Tukey_ranked)[1] <- "Planning.Area"
Planning_Mean <- data_property %>% group_by(Planning.Area) %>% summarise(mean= mean(Unit.Price....psm.), sd= sd(Unit.Price....psm.))
Tukey_ranked <- left_join(Tukey_ranked, Planning_Mean)
assign(paste("aov_", PropType[k], sep = ""), tidy(res.aov))
assign(paste("Tukey_", PropType[k], sep = ""), as.data.frame(Tukey_ranked))
}
Joining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vectorJoining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vectorJoining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vectorJoining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vectorJoining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vectorJoining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vector
head(Tukey_Apartment)
head(Tukey_Condominium)
head(`Tukey_Executive Condominium`)
head(`Tukey_Detached House`)
head(`Tukey_Semi-Detached House`)
head(`Tukey_Terrace House`)
summary(res.aov)
Df Sum Sq Mean Sq F value Pr(>F)
Planning.Area 14 2.071e+09 147906034 216.6 <2e-16 ***
Residuals 1731 1.182e+09 682842
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Detailed analysis
#We performed the ANOVA test to determine if all the mean PSM per planning areas were similar.
#At 95% Confidence level, as P value is less than 0.05, we have sufficient evidence to say to reject H0 and that the mean price per square meter(psm) of flats are significantly different
#To compare the group means, a post hoc test - TUKEY test - was performed per property type
#Given the extensive results output, the TUKEY results were sorted for clarity.
#For each property type, the differences were sorted, filtered for results with adjusted p-value <=0.05 and then ranked accordingly
#At 95% confidence intervals:
# For property type - apartments - Orchard has the most significantly differences between means in PSM. River Valley, Newton and Downtown Core are ranked 2nd.
# For property type - Condominium - Orchard and River Valley have the most significantly different differences between means in PSM. The Newton area came in 2nd.
# For property type - EC - Bishan has the most significantly different differences between means in PSM.Sengkang, Punggol came in 2nd.
# For property type - Detached House - Newton has the most significantly different differences between means in PSM. Tanglin and Novena / Southern islands (Sentosa) are ranked 2nd and 3rd respectively
# For property type - Semi-Detached House - the River Valley area has the most significantly different differences between means in PSM. Tanglin and Marine Parade are joint 2nd.
# For property type - Terrance House - Newton has the most significantly different differences between means. Rochor, River Valley and Novena are joint 2nd.
# output the damn tukey table and sort
# jayne do anova
for (k in 1:length(unique(data$Property.Type))){
data_property <- data %>% filter(Property.Type==unique(data$Property.Type)[k])
res.aov <- aov(Transacted.Price....~Planning.Area,data=data_property)
summary(res.aov)
results <- tidy(TukeyHSD(res.aov,ordered=TRUE))
results_sorted <- results %>% separate(comparison, c("Bigger", "Smaller"),sep = "-")
rankings1 <- results_sorted %>% group_by(Bigger) %>% summarise(Count=n()) %>% arrange(desc(Count))
rankings2 <- results_sorted %>% filter(Smaller %!in% rankings1$Bigger) %>% group_by(Smaller) %>% summarise(Count=n())%>% arrange(Count)
names(rankings2)[1] <- "Bigger"
rankings <- rbind(rankings1,rankings2)
rankings$Rank <- seq.int(nrow(rankings))
rankings %<>% dplyr::select(-Count)
results_sorted <- left_join(results_sorted, rankings) %>% arrange(Rank)
rankings$TukeyRank <- NA
rankings$TukeyRank[1] <- 1
for( i in 2:nrow(rankings)){
if(results_sorted$adj.p.value[results_sorted$Bigger==rankings$Bigger[i-1]&results_sorted$Smaller == rankings$Bigger[i]]<=0.05){
rankings$TukeyRank[i] <- rankings$TukeyRank[i-1]+1
} else if(sum((results_sorted$Smaller[results_sorted$Bigger==rankings$Bigger[i-1]&results_sorted$adj.p.value<=0.05] %in% results_sorted$Smaller[results_sorted$Bigger==rankings$Bigger[i]&results_sorted$adj.p.value>0.05])>0)){
rankings$TukeyRank[i] <- rankings$TukeyRank[i-1]+1
} else {
rankings$TukeyRank[i] <- rankings$TukeyRank[i-1]
}
}
Tukey_ranked <- rankings %>% dplyr::select(Bigger,TukeyRank)
names(Tukey_ranked)[1] <- "Planning.Area"
Planning_Mean <- data_property %>% group_by(Planning.Area) %>% summarise(mean= mean(Transacted.Price....), sd= sd(Transacted.Price....))
Tukey_ranked <- left_join(Tukey_ranked, Planning_Mean)
assign(paste("aov_", PropType[k], sep = ""), tidy(res.aov))
assign(paste("Tukey_", PropType[k], sep = ""), as.data.frame(Tukey_ranked))
}
Joining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vectorJoining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vectorJoining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vectorJoining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vectorJoining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vectorJoining, by = "Bigger"
Joining, by = "Planning.Area"
Column `Planning.Area` joining character vector and factor, coercing into character vector
head(Tukey_Apartment)
head(Tukey_Condominium)
head(`Tukey_Executive Condominium`)
head(`Tukey_Detached House`)
head(`Tukey_Semi-Detached House`)
head(`Tukey_Terrace House`)
summary(res.aov)
Df Sum Sq Mean Sq F value Pr(>F)
Planning.Area 14 1.689e+13 1.206e+12 65.86 <2e-16 ***
Residuals 1731 3.170e+13 1.831e+10
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Detailed analysis
#We performed the ANOVA test to determine if the mean transacted price per planning areas were similar.
#At 95% Confidence level, as P value is less than 0.05, we have sufficient evidence to say to reject H0 and that the mean transacted price of flats are significantly different
#To compare the group means, a post hoc test - TUKEY test - was performed for each property type
#Given the extensive results output, the TUKEY results were sorted for clarity.
#For each property type, the differences were sorted, filtered for results with adjusted p-value <=0.05 and then ranked accordingly
#At 95% confidence intervals:
# For property type - apartments - Orchard has the most significantly different differences between means in transacted prices. Newton and Downtown core are ranked 2nd.
# For property type - Condominium - Newton and Tanglin have the most significantly different differences between means in transacted prices. Orchard and River Valley come in 2nd.
# For property type - EC - the Bishan has the most significantly different differences between means in transacted prices. Ang Mo Kio and Bukit Batok are ranked 2nd and 3rd respectively.
# For property type - Detached House - Newton has the most significantly different differences between means in transacted prices. Tanglin and Southern islands (Sentosa) / Bukit Timah / Novena / Marine Parade are ranked 2nd and 3rd respectively.
# For property type - Semi-Detached House - Tanglin has the most significantly different differences between means in transacted prices. River Valley is ranked 2nd.
# For property type - Terrance House - Newton has the most significantly different differences between means in transacted prices. Tanglin and Novena areas are joint 2nd.
mrt <- read.csv("data/Planning_area_mrt_stations.csv")
mrt$Planning.Area <- toupper(mrt$ï..Planning.Area)
dgp <- centroids <- readOGR("data/MP14_PLNG_AREA_WEB_PL.shp")
OGR data source with driver: ESRI Shapefile
Source: "D:\ASSR\TakeHome\data\MP14_PLNG_AREA_WEB_PL.shp", layer: "MP14_PLNG_AREA_WEB_PL"
with 55 features
It has 12 fields
dgp <- spTransform(dgp, CRS("+proj=longlat +ellps=GRS80"))
mrt2 <- merge(dgp,mrt, by.x ='PLN_AREA_N', by.y = 'Planning.Area')
#All MRT Stations
mrt_pal <- colorFactor(palette= brewer.pal(15, 'RdYlGn'),
domain=c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14),
na.color = "#00000000")
n too large, allowed maximum for palette RdYlGn is 11
Returning the palette you asked for with that many colors
leaflet(dgp) %>% addTiles() %>% addPolygons(fillColor = ~mrt_pal(mrt2$Total.no..of.stations),
weight = 2,
opacity = 1,
color = "grey",
dashArray = "1",
fillOpacity = 0.8) %>% addLegend("topright", mrt_pal, values=(0:14), title= "MRT Stations", labFormat =
labelFormat(suffix = " stations"))
#Only current MRT stations
mrt_pal2 <- colorFactor(palette= brewer.pal(12, 'RdYlGn'),
domain=c(0,1,2,3,4,5,6,7,8,9,10,11),
na.color = "#00000000")
n too large, allowed maximum for palette RdYlGn is 11
Returning the palette you asked for with that many colors
leaflet(dgp) %>% addTiles() %>% addPolygons(fillColor = ~mrt_pal2(mrt2$No..of.operational.stations),
weight = 2,
opacity = 1,
color = "grey",
dashArray = "1",
fillOpacity = 0.8) %>% addLegend("topright", mrt_pal2, values=(0:11), title= "MRT Stations", labFormat =
labelFormat(suffix = " stations"))
NA
#The optimal planning area(s) to purchase will depend on what we want out of the the property. If we are looking to purchase an upscale Flat for investment purposes, Newton could be an option but the mean prices are high and mostly beyond our budget. There is also only 1 MRT station and the properties available are limited. A better alternative planning area would be River Valley as it has one of the highest number of available “Apartments” within $2.5 million. Although accessibility of River Valley is lower than some planning areas with only 1 MRT station, it has 2 more future MRT stations planned (potentially increasing the value) and it is relatively close to the CBD. #If we are looking to buy a flat to live in, we will be more concerned sbout the accessibility as well as affordability. In this regard, we would prefer planning areas like Bedok or Toa Payoh as they are mature estates with high levels of accessibility and amenities. Both Toa Payoh and Bedok have a Integrated Transport Hub and relatively high number of MRT stations, with Bedok having a total of 11 MRT stations (current and future) in total. In addition, the apartments are relatively affordable (xxx and yyy) and we could even purchase 2 if we wanted to. #Alternatively, if we are simply looking for a larger space at the cheapest possible rates and save the rest of the money for other purposes, the best planning areas would be Jurong East or Jurong West as they have the lowest psm (www and zzz). In addition, there are multiple malls available and decent accessibility (bus interchange and MRT). There will be a total of 15 MRT stations in the area in the near future.